POLICY IMPROVEMENT IN MARKOV DECISION PROCESSES AND MARKOV POTENTIAL THEORY
نویسندگان
چکیده
منابع مشابه
Online Markov decision processes with policy iteration
The online Markov decision process (MDP) is a generalization of the classical Markov decision process that incorporates changing reward functions. In this paper, we propose practical online MDP algorithms with policy iteration and theoretically establish a sublinear regret bound. A notable advantage of the proposed algorithm is that it can be easily combined with function approximation, and thu...
متن کاملCascade Markov Decision Processes: Theory and Applications
This paper considers the optimal control of time varying continuous time Markov chains whose transition rates are themselves Markov processes. In one set of problems the solution of an ordinary differential equation is shown to determine the optimal performance and feedback controls, while some other cases are shown to lead to singular optimal control problems which are more difficult to solve....
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملBounded Parameter Markov Decision Processes Bounded Parameter Markov Decision Processes
In this paper, we introduce the notion of a bounded parameter Markov decision process as a generalization of the traditional exact MDP. A bounded parameter MDP is a set of exact MDPs speciied by giving upper and lower bounds on transition probabilities and rewards (all the MDPs in the set share the same state and action space). Bounded parameter MDPs can be used to represent variation or uncert...
متن کاملLearning Qualitative Markov Decision Processes Learning Qualitative Markov Decision Processes
To navigate in natural environments, a robot must decide the best action to take according to its current situation and goal, a problem that can be represented as a Markov Decision Process (MDP). In general, it is assumed that a reasonable state representation and transition model can be provided by the user to the system. When dealing with complex domains, however, it is not always easy or pos...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bulletin of Mathematical Statistics
سال: 1978
ISSN: 0007-4993
DOI: 10.5109/13123